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Abstract 

We provide in this paper simulation algorithms for one-sided and two-sided truncated 
normal distributions. These algorithms are then used to simulate multivariate normal 
variables with restricted parameter space for any covariance structure. 
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1. Introduction 

The need for simulation of truncated normal variables appears in Bayesian inference 
for some truncated parameter space problems. Indeed, it is rarely the case that analyt- 
ical computations are possible and numerical integration can be very intricated for large 
dimensions. Typical examples of such setups can be found in order restricted (or isotonic) 
regression, as illustrated in Robertson, Wright and Dykstra (1988). For instance, one can 
consider a n x n table of normal random variables Xij with means $ij which are increasing 
in % and j (1 < i,j < n), as in Dykstra and Robertson (1982). When n is large, both 
maximum likelihood and Bayesian inferences on this table can be quite cumbersome and 
simulation techniques are then necessary to either obtain mle's by stochastic restoration 
(see Qian and Titterington, 1991) or Bayes estimators by Gibbs sampling (see Gelfand and 
Smith, 1990). Gibbs sampling actually provides a large set of examples where simulation 
from truncated distributions is necessary, for instance for censored models since the re- 
covery of the censored observations implies simulation from the corresponding truncated 
distribution, as shown in details by Gelfand, Smith and Lee (1992). See also Chen and 
Deely (1992) who propose a new version of the Gibbs sampler for estimating the ordered 
coefficients of a regression model. 

We first construct in Section 2 an efficient algorithm for unidimensional truncated 
normal variables. This algorithm is quite simple and, in the particular case of one-sided 
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truncated normal distributions, it slightly improves on a previous algorithm developed 
by Marsaglia (1964). Our multidimensional extension in Section 3 is also based on this 
algorithm. Actually, we propose to use Gibbs sampling to reduce the simulation problem 
to a sequence of one-dimensional simulations. The resulting sample, being derived from a 
Markov chain, is not independent, but can be used similarly for all estimation purposes. 



2. The univariate case 

2.1. One-sided truncation. Let us denote A/+(/U, fx~ , a 2 ) the truncated normal distri- 
bution with left truncation point ix~ , i.e. the distribution with density 

_ 2 exp(-(z - fx) 2 /2a 2 ) 

fixing ,<r)= — ——I x> 

V2na(l -$((//-- aO/°0) 

Obviously, a readily available method is to simulate from a normal distribution J\f(ix, o~ 2 ) 
until the generated number is larger than fi~ . This method is quite reasonable when 
\i~ < fx but is of no use when \x~ is several standard deviations to the right of fx. Similarly, 
Gelfand et al. (1992) and Chen and Deely (1992) suggest to use the classical c.d.f. inversion 
technique, namely to simulate u ~ W[o,i] an d to take 



as the simulation output, but this method calls for a simultaneous evaluation of the normal 
c.d.f. $ and of its inverse and may be quite inefficient if fx~ — fx is large, since 

the precision of the approximation of $ then strongly matters. We provide below an 
accept-reject algorithm which is more efficient than repeatedly simulating from the normal 
distribution as soon as fx~ > fx. In the sequel, we will assume without loss of generality that 
fx = and a 2 = 1, since the usual location-scale rescaling allows to standardize truncated 
normal variables. 

Let us recall first that the general accept-reject algorithm is based on the following 
result (see Devroye, 1985, pp. 40-60). 

Lemma 2.1 Let h and g be two densities such that h(x) < Mg(x) for every x in 
the support of h. The random variable x resulting from the following algorithm 

1. Generate z ~ g{z); 

2. Generate u ~ U[o,i\- If u < h(z)/Mg(z), take x = z; otherwise, repeat from 
step 1. 
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is distributed accorded to h. 

In our case, a possible choice for g is the translated exponential distribution £xp(a, fx~) 
with density 

g(z\a,fj,-) =ae- a ^") I Z > M _. 

Since, for z > \i~ , we have 

e «(z-M~) e -2 2 /2 < e « 2 /2-M"a 

if a > /U~ and 

e «(z- M -) e - 2 2 /2 < e -(M") 2 /2 

if a < /tx , the constant M is given by 

.. e" 2 / 2 -^" if a > //-, 

V2ir(l-*(/i-)) — ^ ' 

7=7- " — rre - ^ ) / 2 otherwise 
and the ratio h(z)/Mg(z) by 

_ f e -z 2 /2+a(z-n-)-a 2 /2+afi- \{ a > ^~ ^ 

Mg(z) ~ \ e -^ 2 /2+a( 2 -^-)+(^-) 2 /2 otherwise.' 
We then derive from Lemma 2.1 the corresponding accept-reject algorithm. 

Lemma 2.2 The following algorithm 

1. Generate z ~ £xp(a, fx~); 

2. Compute g(z) = exp(— (a — z) 2 /2) if \i~ < a and g(z) = exp(( / u~ — ct) 2 /2) 
exp(— (a — z) 2 /2) otherwise; 

3. Generate u ~ W[o,i] an d take x = z if u < g(z); otherwise, repeat from step 
1. 

leads to the generation of a random variable from A/+(0, fj,~ , 1). 
Now, noticing that the probability of acceptance in one single run is 



.2 , 

i2 



ae Qfl " a / 2 $(-^-) v / 2tF if//" < a, 



o; e (M ) / 2 $(— ^ )v / 27r otherwise, 
we deduce that the optimal scale factor in the exponential distribution attained for 



H~ + v/(^-) 2 + 4 
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in the first case and for a = fx in the second case. Furthermore, since the corresponding 
probabilities are proportional to 

and fj,~ exp(( / u~) 2 /2) respectively, with the same coefficient of proportionality, it can be 
shown by using the reparametrization in a* (i.e. fx~ = a* — 1/a*) that the first probability 
is always greater and that the best choice of a is ot*(fx~). Therefore, 

Proposition 2.3 The optimal exponential accept-reject algorithm to simulate 
from a A/+(0, fx~ , 1) when fx~ > is given by 

1. Generate z ~ £xp(a*, fx~); 

2. Compute g(z) = exp{-(z - a*) 2 /2}; 

3. Generate u ~ W[o,i] an d ^ a ^ e x = z if u < g(z); otherwise, go back to step 1. 

Table 2.1 below gives the expected probability IE Q *[£?(z)] for several values of yT . 
It shows the gain brought by using this accept-reject algorithm since the probability of 
accepting in one passage is 0.760 for yT = 0, as compared with 0.5 for the repeated 
normal sampling alternative. The improvement increases as pT goes away from and 
the probability of accepting goes to 1 as \i~ goes to infinity. Note that the probability of 
accepting is greater than 

probability of accepting for a = fx~; this is also the rate obtained by Marsaglia (1964) 
when proposing an accept-reject algorithm using the tail of a Raleigh distribution (see also 
Devroye, 1985, pp. 380-382). The improvement brought by using £xp(a*, fx~) is significant 
for the moderate values of \i~ '. Those large probabilities also hint at likely improvements 
over repeated normal sampling even when \i~ < 0, but such developments would call for 
much more elaborated algorithms and, moreover, fast normal generators can overcome the 
advantages of using a more complex algorithm. 

fi~ 0.5 1 1.5 2 2.5 3 

E a .[e(z)] 0.760 0.826 0.876 0.910 0.934 0.950 0.961 

Table 2.1 - Average probability of acceptance 
according to the truncation point \i~ . 

Simulation from the right truncated normal distribution, x ~ Af-(fi, fi + ,a 2 ), can be 
directly derived from the above algorithm since — x ~ A/+(— fx, —fx + ,a 2 ). We consider in 
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the next section the simulation from the two-sided truncated normal distribution for which 
modifications of the above algorithm are necessary. 



2.2. Two-sided truncated normal distribution. When considering the two-sided 
truncated normal distribution fx~ , fx + , er 2 ), with density 

f{x\n,n ,fx + ,a) = 



2™ [*((/*+ - - - ^)/a))] ' 

the simulation method heavily depends on the range fi + — pT . As before, a first possibility 
is to simulate from a N(fx, o~ 2 ) distribution until z G [fx~ , fi + ] (or even to invert the c.d.f.). 
However, if — fx) — $>(fx~ — fx) is small or even if (fx~ — fx)(fi + — fx) > 0, more 

efficient alternatives are available. We propose here to consider, in addition to the previous 
algorithms, an accept-reject approach based on the uniform W^-^+j distribution. Once 
again, we can assume without loss of generality that fx = and a 2 = 1. 

The accept-reject algorithm based on W^-^+j is 

1. Generate z ~ U^-^+y, 

2. Compute 

( exp(-^ 2 /2) if 0e[[i-,fi+} 

g(z) = { exp({(fx+) 2 -z 2 }/2) if fx+ < 
[exp({(^-) 2 - 2 2 }/2) if 0<ft- 

3. Generate u ~ ZY[o,i] an d take x = z if u < g(z); otherwise, go back to step 1. 
The corresponding expected probability of running the above algorithm only once is 

M+ 2/o, e d 



JE[g(z)] = I e~ z2/2 dz 
J n~ 



fl+ - fX~ 



fl fl 

where d = 0, (fx + ) 2 /2 or (fx~) 2 /2 whether fx + fx~ < 0, /i + < or /i" > 0. Therefore, when 
fx + fx~ < 0, it is more efficient to use this algorithm rather than to use the repeated normal 
method if fx + — fx~ < \^2n. 

We now oppose simulation from the uniform algorithm to repeated simulation from 
a one-sided truncated normal distribution. For instance, if fx~ > 0, we simulate z ~ 
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•A/+(0, \i , 1) until z < fi + . Using the optimal algorithm of Proposition 2.3, the probability 
of accepting in one passage is 

P{u < q{z) and z < 

= / e- (z ~ a } / 2 a*e~ a } dz 

J fJL~ 

= a*e aV "- (a * )2/2 v / 2T($( / u + ) - $(/0) 



= a e 



/ 27r/e (*(**+)- *(/*"))■ 



Therefore, it is better to use the truncated jV+(0, \i , 1) algorithm if 



— / pr 

..* „a jU /2 ' "~ 



i.e. if 



2^ J^ 2 -^V^ 2 +4 
^+ > /i- + J exp ^ ± ) . (2.1) 



Figure 2.1 - Lower bound (2.1) on n + 
for the use of the truncated normal algorithm. 
Figure 2.1. provides the lower bound of (2.1) as a function of \i~ . Note that, as \i~ 
increases, the range fu, + — [f has to get smaller for uniform accept-reject sampling to be 
used. The corresponding decomposition is straightforward to derive when \x + < 0. Table 
2.2 below gives the expected probabilities of acceptance in one run for several values of [x~ 
and /i + — [i~ . 
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.998 
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.927 


.905 



Table 2.2 - Average probabilities of acceptance 
for the simulation of ATt(0, \i~ , 1). 

3. The multivariate case 

We consider now a multivariate normal distribution Af p (fi, £) restricted to a convex 
subset 1Z of IR P , denoted A/" T (/U, E, 1Z). We assume that the one-dimensional slices of 1Z, 

TZi(6i, . . . , 0i-i, 0i+i, ■ ■ ■ ,0 P ) = {9i; (6>i, . . . , 6i-i, 0i, • • • , P ) e 71} , 

are readily available, in the sense that these sets can be represented as intervals [0~ , Of] , 
where the bounding functions 0~ and Of, depending on (0\, . . . , • • • , P ), are 

easily computable (1 < i < p). 

The algorithm we propose below belongs to the class of Markov Chain Monte- Carlo 
methods (as referred to in Hastings (1970) and Geyer (1991)). Namely, instead of gener- 
ating a sequence 0k of i.i.d. random vectors from the distribution of interest, we provide 
a sequence 0( n ) which is a Markov chain with stationary distribution the distribution of 
interest. Such an approximation may seem to fall far from the mark but results like the 
ergodic theorem ensure that the average of any quantity of interest f(0), 

1 N 

n=l 

is converging to the expectation IE [/(#)] as iV goes to infinity, thus generalizing the law 
of large numbers. More details on the application of Markov chain theory in this setup 
are given in Ripley (1987, pp. 113-114), Geyer (1991) and Tierney (1991). Following the 
early Metropolis algorithm (Metropolis et a/., 1953), Markov chain Monte-Carlo simulation 
methods have been used extensively in the past years in Gibbs sampling theory for Bayesian 
computation (see Tanner and Wong (1987), Gelfand and Smith (1990) and Tanner (1991)). 
The main difficulty of this approach, as opposed to usual (independent) Monte-Carlo 
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methods, is to monitor the convergence of the chain to the stationary distribution. Apart 
from classical central limit theorem (see Geyer, 1991) and time-series methods (see Ripley, 
1987, chap. 6), one can suggest the simultaneous estimation of several quantities until 
approximate stationarity of the corresponding averages (3.1) is attained for all functions. 
Gelman and Rubin (1991) also suggest to run several times the algorithm with drastically 
different starting values. In our particular setup, convergence to the stationary distribution 
should be particularly fast since the compactness of 1Z ensures geometric convergence (see 
Tierney, 1991). 

In the setup of truncated normal distributions, the Markov chain 9^ is obtained by 
generating successively the components of A/" T (//, E, 1Z), i.e. 

i. e[ n) ~ a/^ien^- 1 ), . . . , <s> r , e+, o*) 
P . ~ Af+(no P \e[ n \ . . . , 4) 

where the expectations and variances in the above truncated normal distributions are the 
conditional (non-truncated) expectations and variances of the 9i given 9-,i = (9\, . . . , 9i-i, 
. . . ,9 P ). Namely, we have 

where E-,^ is the (p — 1) x (p — 1) matrix derived from E = (afj) by eliminating its z-th 
row and its i-th column and Ej-,j is the (p — 1) vector derived from the i-th column of E 
by removing the i-th row term. 

Moreover, it is important to note that there is no need to invert all the matrices 
to run the algorithm. Indeed, it is possible to derive these inverses from the global inverse 
matrix V = E _1 since they can be written 

E-^ = V^-V^V^/V^, (3.2) 

where V-.^-.^ and "V^i are derived from V the way £-,j-,i and £j-,i are derived from E. 
Therefore, the algorithm only requires at most one inversion of E and the computation of 
the submatrices E"^ by (3.2). 

The comparison with a classical rejection-sampling method based on the simulation of 
x ~ Np{n, E) until the result belongs to 1Z is quite delicate, depending on the probability 
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P(x E TZ) but also on the overall purpose of the simulation. In fact, if this probability is 
rather large and a single observation from A/" T (/U, E, TV) is needed, it is clear that rejection 
sampling is preferable. On the contrary, if a large sample is needed, as it is the case 
for Gibbs sampling and related maximum likelihood methods, then the Markov chain 
Monte-Carlo method should be superior, especially if 1Z is small, since as mentioned above, 
convergence of the Gibbs sampler to the stationary distribution should be fast. 

As a concluding remark, let us consider the following example. The truncated distri- 
bution of interest is 

(«;H T (°'E i]4 

with truncation space 1Z the ball £>(7,r) of center 7 = (71,72) and radius r. Therefore, 

Mfc) = 71 - Vr 2 - (72 - o 2 )\ e+(e 2 ) = 7l + v^ 2 - (72 - # 2 ) 2 , 

= 72 - y/r 2 -(li-0i) 2 , 0+(9 1 ) = 72 + ^r 2 - (71 - #i) 2 
and the conditional distributions defining the Markov chain are 

1. e[ n) ~at+ (^ B - 1 \(?r(^- 1) ),(9f(^- 1) ),i-e 2 ) 

2. ~ at- (^ n) ,^ 2 -(^ n) ), ^ 2 + (^ n) ), 1 - q 2 ) . 
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